Optimality parsing and local cost functions in Discontinuous Grammar
نویسنده
چکیده
After a brief survey of Discontinuous Grammar (DG), we propose local cost functions as a generalization of violable constraints in OT, probabilities in PCFG, and processing times in distributed competition models. We demonstrate how local cost functions can be used in DG to encode violable constraints on word order, landing sites, islands, agreement, and selectional restrictions, as well as lexical frequencies and pragmatic preferences. Rephrasing parsing as an optimization problem in which a large space of partial parses is searched for a minimum-cost solution, we propose a heuristic parsing algorithm for DG based on local search, with worst-case complexity O(n log4 n) given linguistically reasonable assumptions on tree depth and island constraints, and O(n5) without any assumptions. The proposed algorithm works in the presence of local ambiguity, produces the right analysis for a wide range of discontinuous word-order phenomena such as topicalizations, relativizations, extractions, scramblings, and discontinuous APs, and fails on garden-path sentences, just like humans. 1 Discontinuous Grammar Discontinuous Grammar [10] (or DG) is a syntax formalism that was originally conceived as a synthesis between Head-Driven Phrase Structure Grammar [15], Government & Binding [5], and German dependency grammar [6]. Subsequently, it has borrowed ideas from Optimality Theory [8], Tree-Adjoining Grammar [1], 1 Email: [email protected]. Web: www.id.cbs.dk/∼mtk/dg 2 I wish to thank my thesis advisor, Professor Carl Vikner, to whom this paper is dedicated. Thanks also to Richard Hudson, Don C. Mitchell, Owen Rambow, Lars Konieczny, Line Mikkelsen, Sten Vikner, Sabine Kirchmeyer-Andersen, and Alex Klinge for fruitful discussions, and to two anonymous reviewers for highly valuable comments. This work was made possible by a grant from the Danish Research Council for the Humanities. c ©2001 Published by Elsevier Science B. V.
منابع مشابه
Discontinuous Parsing with an Efficient and Accurate DOP Model
We present a discontinuous variant of treesubstitution grammar (tsg) based on Linear Context-Free Rewriting Systems. We use this formalism to instantiate a Data-Oriented Parsing model applied to discontinuous treebank parsing, and obtain a significant improvement over earlier results for this task. The model induces a tsg from the treebank by extracting fragments that occur at least twice. We g...
متن کاملDiscontinuous Data-Oriented Parsing: A mildly context-sensitive all-fragments grammar
Recent advances in parsing technology have made treebank parsing with discontinuous constituents possible, with parser output of competitive quality (Kallmeyer and Maier, 2010). We apply Data-Oriented Parsing (DOP) to a grammar formalism that allows for discontinuous trees (LCFRS). Decisions during parsing are conditioned on all possible fragments, resulting in improved performance. Despite the...
متن کاملPLCFRS Parsing of English Discontinuous Constituents
This paper proposes a direct parsing of non-local dependencies in English. To this end, we use probabilistic linear context-free rewriting systems for data-driven parsing, following recent work on parsing German. In order to do so, we first perform a transformation of the Penn Treebank annotation of non-local dependencies into an annotation using crossing branches. The resulting treebank can be...
متن کاملGrammars for Local and Long Dependencies
Polarized dependency (PD-) grammars are proposed as a means of efficient treatment of discontinuous constructions. PD-grammars describe two kinds of dependencies : local, explicitly derived by the rules, and long, implicitly specified by negative and positive valencies of words. If in a PD-grammar the number of non-saturated valencies in derived structures is bounded by a constant, then it is w...
متن کاملDiscontinuous Data-Oriented Parsing through Mild Context-Sensitivity
It has long been argued that incorporating a notion of discontinuity in phrase-structure is desirable, given phenomena such as topicalization and extraposition, and particular features of languages such as cross-serial dependencies in Dutch and the German Mittelfeld. Up until recently this was mainly a theoretical topic, but advances in parsing technology have made treebank parsing with discont...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Electr. Notes Theor. Comput. Sci.
دوره 53 شماره
صفحات -
تاریخ انتشار 2001